12 research outputs found
Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
It is well known that quantifying uncertainty in the action-value estimates
is crucial for efficient exploration in reinforcement learning. Ensemble
sampling offers a relatively computationally tractable way of doing this using
randomized value functions. However, it still requires a huge amount of
computational resources for complex problems. In this paper, we present an
alternative, computationally efficient way to induce exploration using index
sampling. We use an indexed value function to represent uncertainty in our
action-value estimates. We first present an algorithm to learn parameterized
indexed value function through a distributional version of temporal difference
in a tabular setting and prove its regret bound. Then, in a computational point
of view, we propose a dual-network architecture, Parameterized Indexed Networks
(PINs), comprising one mean network and one uncertainty network to learn the
indexed value function. Finally, we show the efficacy of PINs through
computational experiments.Comment: 17 pages, 4 figures, Proceedings of the 34th AAAI Conference on
Artificial Intelligenc
Reinforcement Learning, Bit by Bit
Reinforcement learning agents have demonstrated remarkable achievements in
simulated environments. Data efficiency poses an impediment to carrying this
success over to real environments. The design of data-efficient agents calls
for a deeper understanding of information acquisition and representation. We
develop concepts and establish a regret bound that together offer principled
guidance. The bound sheds light on questions of what information to seek, how
to seek that information, and it what information to retain. To illustrate
concepts, we design simple agents that build on them and present computational
results that demonstrate improvements in data efficiency
Epistemic Neural Networks
Intelligence relies on an agent's knowledge of what it does not know. This
capability can be assessed based on the quality of joint predictions of labels
across multiple inputs. Conventional neural networks lack this capability and,
since most research has focused on marginal predictions, this shortcoming has
been largely overlooked. We introduce the epistemic neural network (ENN) as an
interface for models that represent uncertainty as required to generate useful
joint predictions. While prior approaches to uncertainty modeling such as
Bayesian neural networks can be expressed as ENNs, this new interface
facilitates comparison of joint predictions and the design of novel
architectures and algorithms. In particular, we introduce the epinet: an
architecture that can supplement any conventional neural network, including
large pretrained models, and can be trained with modest incremental computation
to estimate uncertainty. With an epinet, conventional neural networks
outperform very large ensembles, consisting of hundreds or more particles, with
orders of magnitude less computation. We demonstrate this efficacy across
synthetic data, ImageNet, and some reinforcement learning tasks. As part of
this effort we open-source experiment code
The Neural Testbed: Evaluating Joint Predictions
Predictive distributions quantify uncertainties ignored by point estimates.
This paper introduces The Neural Testbed: an open-source benchmark for
controlled and principled evaluation of agents that generate such predictions.
Crucially, the testbed assesses agents not only on the quality of their
marginal predictions per input, but also on their joint predictions across many
inputs. We evaluate a range of agents using a simple neural network data
generating process. Our results indicate that some popular Bayesian deep
learning agents do not fare well with joint predictions, even when they can
produce accurate marginal predictions. We also show that the quality of joint
predictions drives performance in downstream decision tasks. We find these
results are robust across choice a wide range of generative models, and
highlight the practical importance of joint predictions to the community
Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments